2 research outputs found
Reward Imputation with Sketching for Contextual Batched Bandits
Contextual batched bandit (CBB) is a setting where a batch of rewards is
observed from the environment at the end of each episode, but the rewards of
the non-executed actions are unobserved, resulting in partial-information
feedback. Existing approaches for CBB often ignore the rewards of the
non-executed actions, leading to underutilization of feedback information. In
this paper, we propose an efficient approach called Sketched Policy Updating
with Imputed Rewards (SPUIR) that completes the unobserved rewards using
sketching, which approximates the full-information feedbacks. We formulate
reward imputation as an imputation regularized ridge regression problem that
captures the feedback mechanisms of both executed and non-executed actions. To
reduce time complexity, we solve the regression problem using randomized
sketching. We prove that our approach achieves an instantaneous regret with
controllable bias and smaller variance than approaches without reward
imputation. Furthermore, our approach enjoys a sublinear regret bound against
the optimal policy. We also present two extensions, a rate-scheduled version
and a version for nonlinear rewards, making our approach more practical.
Experimental results show that SPUIR outperforms state-of-the-art baselines on
synthetic, public benchmark, and real-world datasets.Comment: Accepted by NeurIPS 202
Uncovering ChatGPT's Capabilities in Recommender Systems
The debut of ChatGPT has recently attracted the attention of the natural
language processing (NLP) community and beyond. Existing studies have
demonstrated that ChatGPT shows significant improvement in a range of
downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms
of recommendations remain unclear. In this study, we aim to conduct an
empirical analysis of ChatGPT's recommendation ability from an Information
Retrieval (IR) perspective, including point-wise, pair-wise, and list-wise
ranking. To achieve this goal, we re-formulate the above three recommendation
policies into a domain-specific prompt format. Through extensive experiments on
four datasets from different domains, we demonstrate that ChatGPT outperforms
other large language models across all three ranking policies. Based on the
analysis of unit cost improvements, we identify that ChatGPT with list-wise
ranking achieves the best trade-off between cost and performance compared to
point-wise and pair-wise ranking. Moreover, ChatGPT shows the potential for
mitigating the cold start problem and explainable recommendation. To facilitate
further explorations in this area, the full code and detailed original results
are open-sourced at https://github.com/rainym00d/LLM4RS